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Pseudomonas sp. strain Ml is a soil isolate with remarkable biotechnological potential. The genome of Pseudomonas sp. Ml was 
sequenced using both 454 and Illumina technologies. A customized genome assembly pipeline was used to reconstruct its ge- 
nome sequence to a single scaffold. 
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Pseudomonas sp. strain Ml, isolated from the Rhine River ( 1), is 
able to utilize several toxic and/or recalcitrant compounds, 
such as myrcene (1, 2), citral, citronellol, phenol (3-5), chloro- 
phenols, and benzene (5), as its sole carbon and energy sources. 
However, the molecular mechanisms of Ml strain that are associ- 
ated with the utilization of those (and other) less common carbon 
sources are still poorly known. To set the proper background for 
exploring the biotransformation potential of Pseudomonas sp. 
Ml, its genome was sequenced using both 454 FLX and Illumina 
Genome Analyzer IIx next-generation sequencing technologies. 
The 454 FLX sequencing technology yielded a 264,177 single-read 
data set with an average read length of 523 bp, whereas the Illu- 
mina technology was used to produce two 50-bp read-length data 
sets: (i) 5,303,579 pair-end reads with an estimated insert size of 
about 320 bp; and (ii) 5,478,608 mate-paired reads with an esti- 
mated insert size of about 5,200 bp. The removal of adapter se- 
quences and quality trimming were performed in all data sets 
prior to de novo assembly. To reconstruct the genome of Pseu- 
domonas sp. Ml, a customized pipeline was set, based on prelim- 
inary comparative trials using different genome assemblers. First, 
454 FLX single reads were assembled with Newbler v2.6 (6), gen- 
erating 379 contigs (minimum contig size of 1,000 bp) with a total 
size of 6,860,386 bp. Second, the 454 FLX-generated contigs were 
used as a genome backbone to produce eight scaffolds using both 
Illumina libraries in SSPACE v2.0 (7). The scaffolds we obtained 
included over 200 gaps, which were significantly filled by combin- 
ing local alignment with GapFiller (8) and GapCloser (9). To 
obtain a more contiguous and accurate genome sequence, a fur- 
ther sequential run of SSPACE (7), GapFiller (8) and GapCloser 
(9), and Anchor vO.3.1 (http://www.bcgsc.ca/platform/bioinfo 
/software/anchor) was done, resulting in a single scaffold repre- 
senting the genome of Pseudomonas sp. Ml. Further genome se- 
quence accuracy was improved by manual curation (based on 
read alignment analysis) and Sanger sequencing to confirm 
sequence accuracy and to close different sequence gaps. None- 
theless, nine repeat-rich regions were not fully resolved. As a 
whole, the current draft of the Pseudomonas sp. Ml genome is 
composed of nine contigs organized in a single scaffold, with a 



total size of 6,958,606 bp (including 1,753 N's), with an estimated 
G+C content of 67.3%. This genome sequence was annotated 
using Prokka vl.5.2 (http://www.vicbioinformatics.com/software 
.prokka.shtml) and deposited at DDBJ/EMBL/GenBank. The an- 
notated genome includes 6,053 coding sequences (CDSs), 12 
rRNAs (four copies each of 5S, 16S, and 23S rRNA), 66 tRNAs, 
and 804 hypothetical proteins. 

Further inspection of the genome of Pseudomonas sp. Ml re- 
vealed the presence of a significant number of bio technologically 
interesting enzymes (e.g., 74 oxygenases/hydroxylases) whose 
functionality may be fine-tuned using systems and/or synthetic 
biology approaches. 

Nucleotide sequence accession numbers. This Whole Ge- 
nome Shotgun project (Bioproject: PRJNA62721) has been de- 
posited at DDBJ/EMBL/GenBank under the accession number 
ANIR00000000. The version described in this article is the first 
version, ANIRO 1000000. 
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